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Abstract — We present a quasi-conjugate Bayes approach for estimating Generalized 
Pareto Distribution (GPD) parameters, distribution tails and extreme quantiles within the 
Peaks-Over-Threshold framework. Damsleth conjugate Bayes structure on Gamma distri- 
butions is transfered to GPD. Posterior estimates are then computed by Gibbs samplers 
with Hastings- Metropolis steps. Accurate Bayes credibility intervals are also defined, they 
provide assessment of the quality of the extreme events estimates. An empirical Bayesian 
method is used in this work, but the suggested approach could incorporate prior informa- 
tion. It is shown that the obtained quasi-conjugate Bayes estimators compare well with 
the GPD standard estimators when simulated and real data sets are studied. 
Key words — Extreme quantiles, Gamcon II distribution, Generalized Pareto Distribu- 
tion, Gibbs Sampler, Peaks over thresholds (POT). 
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1 Introduction 



Motivated by univariate tail and extreme quantile estimation, our goal is to develop new 
Bayesian procedures for making statistical inference on the shape and scale parameters 
of Generalized Pareto Distributions (GPD) when used to model heavy tails and estimate 
extreme quantiles. 

Let us assume that observations of a studied phenomenon issued 
from independent and identically distributed (i.i.d.) random variables Xi, X 2 , . . . , X n 
with unknown common distribution function (d.f.) F. Suppose that one needs to estimate 
either extreme quantiles q±- p of F (i.e., 1 — F(qi- P ) = p with p G (0, 1/n] typically), or the 
extreme tail of F (i.e., 1 — F(x) for x > x n>n , where x^ n < . . . < x n>n denote the ordered 
observations). It is usually recommended to use the Peaks Over Threshold (POT) method 
(described for example in Davison and Smith (1990) or in the monographs Embrechts et al. 
(1997) and Reiss and Thomas (2001)), where only observations Xi exceeding a sufficiently 
high threshold u n are considered. In view of the theorem of Balkema and de Haan (1974) 
and Pickands (1975) the probability distribution of the k = k n positive excesses yj = 
x n -j+i,n - u n for j = 1, . . . , k, where x n - k , n < u n < x n - k +i, n , can be approximated for 
large u n by a GPD (7, a) distribution with scale parameter a > and shape parameter 7. 



with y + = max(y, 0), where y G M + when 7 > 0, and y G [0, — cr/7] when 7 < 0. 

The shape and scale parameters of the approximating GPD are estimated on the basis 
of the excesses above u n . The estimates are then usually plugged into the GPD d.f. and ex- 
treme quantile estimates are deduced. In this perspective, good estimation procedures for 
the shape and scale parameters of a GPD on the basis of approximately i.i.d. observations 
are necessary for accurate tail estimation. Estimating the shape and scale parameters, 7 
and ex, is not easy. Smith (1987) has shown that estimating GPD parameters by maximum 



The d.f. of GPD (7, a) is 




(1) 
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likelihood (ML) is a non regular problem for 7 < —1/2. Besides, the ML estimators may 
be numerically hardly tractable, see Davison and Smith (1990) and Grimshaw (1993). 
Moreover, the properties of MLE's meet their asymptotic theory only when the sample 
size (in our case the number k of exceedances) is larger than about 500. Many alternative 
estimators have been proposed: Hosking and Wallis (1987) linear estimators are based on 
probability weighted moments (PWM); they are easily computed and reasonably efficient 
when —0.4 < 7 < 0.4 approximately (see also the comparative study of Singh and Guo, 
1997). Castillo and Hadi (1997) have proposed other estimators based on the elemental 
percentile method (EPM), involving intensive computations. Their numerical simulations 
show that EPM estimators are more efficient than PWM ones only when 7 < 0. 

Semiparametric estimators of 7 along with related estimators of extreme quantiles 
have been intensively studied. For example, the Hill estimator presented by Hill (1975) 
and studied in Haeusler and Teugels (1985) and Beirlant and Teugels (1989), among many 
others. Two classic extensions of the Hill estimator are: The moment tail index estima- 
tor (denoted hereafter MTI(DEdH)) of Dekkers, Einmahl and de Haan (1989); The Zipf 
estimator, see Schultze and Steinebach (1996), and its generalization by Beirlant, Dierckx 
and Guillou (2001), denoted hereafter ZipfG. Most of these semiparametric estimators do 
not perform much better than parametric ones when applied to sets of excesses. Only the 
ZipfG estimator seems to outperform the other ones. 

In this paper, a new Bayesian inference approach for GPD's with 7 > is introduced. 
In a number of application areas such as structural reliability (see for example Grimshaw, 
1993) and excess-of-loss reinsurance (see Reiss and Thomas, 2001), tail estimation based 
on small or moderate data sets is needed. In such situations Bayes procedures can be used 
to capture and take into account all available information including expert information 
even when it is loose. Moreover, in the realm of tail inference, evaluating the imprecision 
of estimates is of vital importance. Bootstrap methods have been suggested to assess this 
imprecision. But standard bootstrap based on larger values of ordered samples is known to 
be inconsistent, whereas standard bootstrap based on excess samples has not second-order 
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coverage accuracy and is imprecise when sample sizes are not extremely large (e.g., Bacro 
and Brito (1993), Caers, Beirlant and Vynckier, 1998). On the contrary, in the Bayesian 
context, credibility regions and marginal credibility intervals for GPD parameters and 
related high quantiles provide a non-asymptotic geometry of uncertainty directly based 
on outputs of the procedure, thus shortcutting bootstrap. For all these reasons (expert 
information, credibility intervals) easily implementable Bayesian inference procedures for 
GPD's are highly desirable to study excess samples in the scope of POT methodology. 

The restriction 7 > is not too damaging, since several major application areas are con- 
nected to heavy-tailed distributions. Our approach can also be tried for data issued from 
distributions suspected to lie in Gumbel's maximum domain of attraction (DA(Gumbel)) 
where 7 = 0. In the latter case, direct Bayesian analysis of the exponential distribution 
can be made in parallel (see Appendix A). 

For other papers on Bayesian approaches to high quantile estimation, see, e.g., Coles 
and Tawn (1996), Coles and Powell (1996), Reiss and Thomas (1999), Tancredi et al. 
(2002), Bottolo et al. (2003), and the monographs Reiss and Thomas (2001) and Coles 
(2001) along with references therein. 

Our starting point is a representation of heavy-tailed GPD's as mixtures of exponential 
distributions with a gamma mixing distribution. Since the Bayesian conjugate class for 
gamma distributions is documented (Damsleth, 1975) we only have to transfer it to GPD's. 
As described in Section 2, this provides a natural Bayesian quasi-conjugate class for heavy- 
tailed GPD's. Even though Bayes estimators have no analytical expressions, the quasi- 
conjugate structure makes the Bayes computations very simple, the convergence of the 
MCMC algorithms very quick and gave a high parsimony to the global approach with an 
intuitive interpretation of the hyperparameters. 

Section 3 compares our Bayes estimates to ML, PWM, moment tail index MTI(DEdH) 
and generalized Zipf (ZipfG) estimates on excess samples through Monte Carlo simulations. 
Section 4 is devoted to benchmark real data sets. Finally, Section 5 lists some conclusions 
and presents forthcoming research projects. 
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2 Bayesian inference for GPD parameters 

The standard parameterization of heavy-tailed GPD distributions described by (pQ) when 
7 > is now replaced by a more convenient one depending on the two positive parameters 
a = I/7 and = cr/7. The re-parameterized version GPD(a, (3) has probability density 
function (p.d.f.) 



We assume in the following that we have observations y = (yx, . . . , yk) which are realiza- 
tions of i.i.d. random variables Yj_, . . . , Y*. approximately issued from ([2]). Typically, they 
represent excesses above some sufficiently high threshold u. The latter means that for 
each j < k, there exists an integer i < n such that Yj = X{ — u, Xi > u, where the 
X[s are assumed i.i.d. and issued from a distribution in Frechet's maximum domain of 
attraction: DA(Frechet). Remark that the case where the common distribution of the X[s 
is in DA(Gumbel) can also be covered by considering the limiting situation a — > +00 and 
— > +00 with (3 /a — > a > (see Appendix A). 



Our starting point is the following mixture representation for 02), see Reiss and Thomas 
(2001) page 157: 



where for z > 0, g(z\ a, (3) = [f3 a /T(a)} z a ~ 1 e~ l3z is the density of the Gamma(a, (3) 
distribution with shape and scale parameters a and (3. The previous representation stands 
only in DA(Frechet) as a and (3 have to be non-negative (as parameters of a gamma 
distribution) which implies 7 > 0. 

There is no Bayesian conjugate class for GPD's. Nevertheless, as shown below, the 
mixture form (E]) allows us to make use of the conjugate class for gamma distributions to 
construct a suitable quasi-conjugate class for GPD's. 




(2) 




(3) 
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2.1 Transferring conjugate structure from Gamma to GPD 

According to Damsleth (1975), the description of the conjugate class for gamma distribu- 
tions relies on the so-called type II Gamcon distributions: for x > 0, the density of the 
Gamcon II(c, d) distribution with parameters c > 1 and d > is 

£m (*) = C T ( dx + !) ( r W)" d > ( 4 ) 

where J Cj< 2 = J °° T(dx + l)(r(x))" d (cc?)~ d:E dx. Let in the following z = (z\, . . . , Zk) denote 
a A;-sample of realizations of i.i.d. random variables Z 1 , . . . , Z k issued from Gamma(a, j3). 
According to Damsleth (1975), Theorem 2, the conjugate prior density on (a, (5) with 
hyperparameters 5 > and r] > fj, > is given by ng^^a, (5) = ir(a) tt (j3 \ a) 
where ir(a) is the density of Gamcon II(c = rj/ //, d = 5) and n(f3\ a) is the density of 
Gamma(fo + 1, 5rj). Then, the conditional density of a given z, denoted n(a\ z), is 
Gamcon II(?////, 5') with 



k \ V(<5 + fe) 



and the conditional density of /3 given ct and z, denoted 7r(/3| a, z), is Gamma(J'a + 1, 5'r)'). 
Remark 1 . - The hyperparameters i] and /i act on the sufficient statistics s 1 = Yli=i z i 
and s 2 = Yli=i^ nz ii whereas 5 tunes the importance of these modifications. When 5 is a 
positive integer, the introduction of these prior distributions can loosely be interpreted as 
artificially adding S observations with arithmetic mean i] in si and another 5 observations 
with geometric mean \x in s 2 with r\j '/J, — c> 1. □ 
Now the question is: How can we deduce the posterior density of the GPD parameters 
(a, (5) given y (assumed in the following to be a k— sample from GPD) from the gamma 
conjugate structure, starting with Damsleth prior density ^^(a, pi) ? 
In the remaining of this section the following notations are used: 

- Let 9 = (a, P). Consequently fg and gg will denote respectively p.d.f.'s of GPD(a, P) 
and Gamma(a, P) probability distributions. 

- The likelihood function of the observations y for f e writes /(y| 9) = Y\i=i f(Vi\0)- 
Similarly, the likelihood function of z for gg writes g(z\0) = Yli=i d{ z i \ 9)- 
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- We let 7r(9\ y) denote the posterior density of 9 given the observations y = (yi, . . . , y^) 
corresponding to fg and the prior density ir(9) = vr^ ^^ (Damsleth prior): 
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where U(y) = [ f (y| 9') tt(9') d9' . (6) 
Je 



fAy) 

- Similarly, n(9\ z) denotes the posterior density of 9 given z = (zx, . . . , Zk) corresponding 
to ge and Damsleth prior density ir: 

g(z\9) ir(9) 



7T W Z 



Qtt (z) 

- We further denote 



where g w (z) = [ g (z| 9') tt {9') d9' . (7) 
Je 



qAzly) = j^yT^F^ ' where ^yi z )=(n^) ex p(-E^j- ( 8 ) 

Thus, the posterior distribution of 9 given y is a mixture of the posterior distributions 
of 9 given z with mixing density q n (z \ y): 



7T I 



y) = J ?*(z|y) tt(0|z) tfa. (9) 



It follows that each posterior moment given y is the integral of the corresponding posterior 
moment given z with respect to qv(z|y). Unfortunately, the functions g n (see (J?])-®) 
and z i — y q n (z\y) (see (jE]H©) are not expressible in analytical close form. Therefore, 
a numerical algorithm is needed. The mixture representation ([9]) allows us to design a 
simple and efficient Gibbs sampler. It is presented in the next subsection. 

2.2 Gibbs sampling 

A Gibbs sampler is used to get approximate simulations from the posterior density of 9 
given y, 7r(9\y). Damsleth's priors are used for 9 = (a, 0). The proposed sampler generates 
a Markov chain whose equilibrium density (denoted 7r(z, 0|y)) is the joint density of 
(Z, 9) conditionally on Y = y where Y and Z denote respectively random vectors of k 
independent Gamma(a, 0) and GPD(a, 0) random variables. To implement the Gibbs 
sampler we first note that within the general setting of subsection 12.11 the conditional 
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density of 9 given (y, z) is independent of y: 7r(0|y, z) = ^(^(z), and the conditional 
density of Z given (y, 9) is 

/(»|y,*) = p{y \^ 9 ^ 6) = UfWv^o) cx n*? c -0+«*W (io) 

It follows that for i < k, conditionally on # and Yi = yi, Zi ~ Gamma(a + 1, /3 + 
independently. This yields the following intertwining sampler, where 9^ denotes the 
current parameter value at iteration m. The next iteration: 

1 . independently simulates each zf 1 ^ from GammatV™) + 1, 0^ + y<) ; 

2. simulates fl( m+1 ) from ir(9\ z ( m+1 )) . 

In such a setting, both (z^ m ') m >o and (9 {m) ) m >o are Markov chains. The simulation step 
of # (m+1) is split into the marginal simulation of a;( m+1 ) and the conditional simulation of 
fl( m + 1 ) given a( m+1 \ Finally, the iteration m+ 1 of our Gibbs sampler: 

1 . independently simulates each zf 1 ^ from GammatV™) + 1, 0^ + y t ) ; 

2. simulates a (m+1) from ir(a\ z (m+1) ) , i.e. from Gamcon iKif/f/, 5') with 5', 77' 
and // computed from z^ m+1 ^ using equation © ; 

3. simulates /?( m+1 ) from Gamma(5'a( m+1 ) + 1, 5'rf) . 

When the equilibrium regime is nearly reached, simulated values of # are approximately 
issued from the posterior distribution of 9 given y. Implementing the previous algorithm 
requires simulating Gamcon II distributions and choosing adequate values of the hyper- 
parameters 5, 77 and li of the priors ir(a) and tt(/3| a). 

2.3 Simulating Gamcon II distributions 

Our sampling scheme involves simulations from Gamcon II (c, d) distributions with c = 
d = rj'/fi', where 7/ and li' are given by and moderate to large values of d = d' 
= 5' = 5 + k. Up to our knowledge, there is no standard algorithm for simulating such 
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distributions. The simulation method that we present is based on a normal approximation 
to Gamcon II distributions using Laplace's method. 

Gamcon II(c, d) can be approximated by a normal distribution having the same mode. 
It is proved in Garrido (2002) that this mode, M Cyd) is the unique root of the equation 

i>(dM Cid + 1) - xP(M c>d ) - hid - lnc = 0, (11) 

where ip denotes the digamma function: the derivative of the logarithm of T(t). The 
standard deviation S c>d of the normal approximant distribution is computed through a 
Taylor expansion of the Gamcon II (c, d) density in a neighborhood of its mode: 

S c d = 1 . (12) 

y/ dip' (M C) d ) - W(dM c>d + 1) 

Garrido (2002) has established that (1 - l/d)/(mc + lnd/2) < M Cjd < 2/ Inc. In practice, 
M C (J is numerically approximated through the bisection method. 

At each iteration, we simulate Gamcon II(c', d 1 ) with the help of the independent 
Hastings-Metropolis algorithm, which requires a suitable proposal density. Actually, it is 
enough to make only one step of Hastings-Metropolis at each iteration of the Gibbs sam- 
pler: see Robert (1998). The proposal density must be as close as possible to the simulated 
density, Gamcon II(c', d'), and have heavier tails to ensure good mixing. Since Gamcon II 
densities have gamma-like tails, we cannot directly take the normal approximant density 
as a proposal. Rather, we have chosen a Cauchy proposal density as close as possible to 
the normal approximant density to Gamcon II(c', d'), i.e. with the same mode and modal 
value. Therefore at each iteration our Hastings-Metropolis step: 

1. independently simulates a new Y from the Cauchy distribution with mode 
M c / td / and modal value l/(S c / id /y/2ir) ; 

/cauchy & AY) 



2. computes the ratio p = min 
the density of Y ; 



wnere /cauchy denotes 



3. takes «( m+1 ) = Y with probability p and «( m+1 ) = otherwise. 
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Since all the transition densities involved are positive, the resulting Markov chain 
(6^) m >Q is ergodic with unique invariant probability measure equal to the posterior dis- 
tribution of given y. Intensive numerical experiments reported in Garrido (2002) show 
that for 5 > 0.5, this Gibbs sampler with one Hastings-Metropolis step at each iteration 
converges quickly to its invariant distribution. Actually, it seems that discarding the first 
500 iterations is sufficient. For 5 < 0.5, we observed numerical instabilities. 

Bayesian inference on the GPD parameters a, f3 is based on the outputs of this algo- 
rithm when it has approximately reached its stationary regime. We then record a sufficient 
number of realizations and f3^ m \ 

2.4 Choice of hyperparameters 

In this paper we suppose that no expert information is available for the choice of priors on 
the GPD parameters (a, (3) and we thus take an empirical Bayes approach. Introduction 
of expert information is discussed in Appendix B. 

We take 5 = 1 both for convenience (for 6 = 1, the prior n(a) reduces to a gamma dis- 
tribution, see below) and because a small value of 5 > indicates little confidence in prior 
information (see Remark 1). Recall that we observed numerical instabilities of our Gibbs 
sampler for 5 < 0.5. A natural approach to compute hyperparameter values of priors ir(a) 
and n(f3 \ a) is to equate some of their location parameters (e.g. the mean) to frequentist 
estimates of a and /3, denoted here a and /3. Recall that the mean of the prior distribution 
7r(/3| a) is (a + 1)/V Taking this mean equal to j3 — x n -k, n , the estimate induced by the 
Hill procedure, and replacing a by its Hill estimate yields rj = (a + l)//3. When 5 = 1, the 
prior ir(a) reduces to Gamma(2, In (77///)). Its mean is 2/ln(r///x). Solving the equation 
where this prior mean is set equal to a yields /x = (a + 1) (exp (— 2/a)) /j3. 

If prior modes are used instead of prior means, a similar approach leads to slightly 
different formulas for r\ and ji. Actually, with prior modes explicit expressions can be ob- 
tained for all 5 > 0. However, preliminary numerical experiments yielded better estimates 
with prior means. 
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2.5 Computation of Bayesian estimates 

When the Gibbs sampler approximately reaches its stationary regime, K values denoted 
(a.( m \ /?( m )), m = 1, . . . , K, are saved. This sample is used to compute posterior means, 
medians or modes to estimate (a, 0), leading to Bayesian estimates for (7, a) and q\- p . 
Remark 2 . - The posterior modes are more difficult to compute since one has first to 
construct smooth estimates of the joint posterior density of a and (3. Numerical experi- 
ments reported in Garrido (2002) for both GPD generated data and excess samples led us 
to keep only posterior medians. □ 
Concerning the estimation of an extreme quantile (y = (yi, ■ ■ ■ ,yk) is a sample of 
excesses over a threshold u), a sample of values q^ p is computed using POT from the 
simulated (a (m \ (3^Ys: 

■np\- 1 / a(m) 



g£J = u + 0™ 



k 



- 1 



(13) 



Means, histograms and credibility intervals can then be computed and represented from f[T3"j) . 
Bayesian credibility intervals for a, (3 and qi- p are obtained by sorting the correspond- 
ing simulated values obtained from the Gibbs sampler. See sections |3] and |4] below. The 
probability distribution of the observed sample y can be estimated either by GPD(o?, /3), 
or by the posterior predictive distribution. Similarly, predictive quantile functions can be 
approximated through 

K 



F~\ Ied (y) ^^Yl F a^,^(y) ■ ( 14 ) 

m=l 

3 Comparative Monte Carlo simulations 

Intensive Monte Carlo simulations were used to compare our Bayes quasi-conjugate esti- 
mators (denoted hereafter Bayes-QC) of 7 and gi_ p with their counterparts when usual 
estimators of GPD parameters are used: Maximum Likelihood (ML), Moment Tail Index 
estimator (MTI(DEdH)), Generalized Zipf estimator (ZipfG) and Probability Weighted 
Moments (PWM). 
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3.1 The simulation design 

Three probability distributions in DA(Frechet) were considered in order to produce various 
excess samples: 

- The Frechet(l) distribution, for which 7 = 1 and the second-order regular variation 
parameter (presented below) p = — 1. The d.f. of Frechet(/3) for (3 > is F(x) = 
exp (— x^ 1 ^), x > 0. 

- The Burr(l, 0.5, 2) distribution, for which 7 = 1 and p = —0.5. The d.f. of Burr(/3, r, A) 
for > 0, r > 0, A > is F(x) = 1 - [/?/(/? + x r )]\ x > 0. 

- The Log-Gamma(2) distribution, for which 7 = 1 and p = 0. The density of Log- 
Gamma^) is f(x) = x _2 (lna;)~ 1 , x > 0. 

The second-order regular variation parameter p (p < 0) indicates the quality of approx- 
imation of F u by a GPD(7, cr(u)) for high values of u and suitable <t(m)'s. High values 
of |p| indicate excellent fitting, whereas values of \p\ close to indicate bad fitting. 
For each one of these three probability distributions, 100 data sets of size n = 500 were 
independently generated. For each simulated data set and each value of k = 5, 10, ... , 495, 
we performed 1 000 iterations of the Gibbs sampler and only the last 500 ones were kept. 

3.2 First results 

For each simulated data set and each value of k = 5, 10, ... , 495 Bayes-QC estimates of 7 
and qi-p, p = 1/5 000, are computed as the medians of the resulting 500 7( m )'s and ^Tp's 
given in the 500 last iterations of the Gibbs sampler. Figures [TH3] display the averages over 
the 100 data sets of these Bayesian estimates of 7 and as functions of k. The means 
and modes were also computed and gave quite similar results. They are not displayed here. 
ML, MTI(DEdH) and ZipfG estimates of 7 and qi- p , p = 1/5 000, were also computed 
for each of those simulated data sets and the same values of k. Figures [TH2] display the 
averages over the 100 data sets of these estimates of 7 and qi- p . 

The left panels of Figures HH3] show that our estimates of 7 based on posterior medians 
(continuous line curves) perform rather well compared to the other ones, and give estimates 
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close to ML and ZipfG. The right panels of Figures [TH3] show that our estimates of are 
very close to those obtained by ML. Both give better estimates than MTI(DEdH) but in 
general do not perform as well as ZipfG, although they seem to be more stable as functions 
of k. Again, credibility intervals for 7 and gi_ p and potential improvements when prior 
information is available are strong arguments supporting the use of our Bayesian procedure. 

3.3 Bayes credibility and Monte-Carlo confidence intervals 

Monte Carlo simulations were also used to study whether Bayesian credibility intervals 
could be used as approximate frequentist confidence intervals for 7 and q\- p . For each 
simulated data set, the last 500 iterations of our Gibbs sampler provide 500 estimates ■y^ 
and 500 estimates q^ p , m = 501, . . . , 1 000. They can be sorted to provide approximate 
90 % credibility intervals for 7 and q\- p . The precision of these credibility intervals was 
studied through Monte Carlo simulations: for each one of the three probability distribu- 
tions considered and for each value of k, we counted the number of simulated data sets 
(out of 100) for which the true values of 7 and gi_ p fell within the corresponding 90 % 
credibility intervals. Figure H] exhibits the coverage rates for each simulated distribution 
and for k = 5, 10, . . . ,495. These credibility intervals are very accurate for the Frechet 
distribution. For the Log-Gamma distribution, the credibility intervals have good coverage 
rates for qi_ p but not for 7. 

This rather unexpected behavior when p = can be explained in terms of penultimate 
approximation, see Diebolt, Guillou and Worms (2003): it can be proved that for p = 0, the 
distribution of excesses is better approximated by a GPD with scale parameter 7 + a^(F), 
where a>k{F) is some correction term, than by a GPD with scale parameter 7 (see, e.g., 
the paper coauthored by Kaufmann, pages 183-190 in Reiss and Thomas (2001) along 
with references therein and Worms (2001)). This explains why in the case p = the 
estimates of 7 strongly deviate from the true value. Furthermore, Diebolt et al. (2003) have 
established for all sufficiently regular estimators (7, a) of the parameters (7, a) such as ML 
or PWM, that when p = the estimated survival function F^^ is a bias-corrected estimate 
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of -F 7itT . We think that this result partially explains the good and stable coverage rates 



observed for qi_ p for the Log-Gamma distribution. These trials show that the credibility 
intervals computed through our procedure give very promising results. 

For each one of the three simulated probability distributions and each value of k G 
{5, 10, ... , 495} the 100 simulated data sets give 100 estimates of (7, q\~ p ). The empirical 
0.05 and 0.95 quantiles of the previous estimates give 90% Monte-Carlo Confidence inter- 
vals (MCCI) for (7,gi- p ). The width of these MCCI's are used in Figure |5] to compare 
the precision of our Bayes Quasi-conjugate estimator to ML, ZipfG and MTI(DEdH) 
estimators. For Burr data sets (left panel of Figure E]) ZipfG gives the most precise quan- 
tile estimators, it is followed by ML, Bayes-QC and MTI(DEdH). For Frechet, Bayes-QC 
and ML have the best precisions. It is worth noting that the Bayes-QC is used here in its 

empirical version. Its precision will increase when combined with expert opinions. 

gamma estimates for Burr(1 ,0.5,2) Quantile estimates for Burr(1 ,0.5,2) 



E 
E 



MTI(DEdH) 
Bayes-QC 





200 300 400 

numb, of excesses k 



200 300 400 

numb, of excesses k 



Figure 1: Means of 7 and q on the 100 Monte Carlo replications from Burr(l, 0.5, 2) for 
different values of the number k of excesses. 



In Figure |6] the average Bayes credibility intervals (averaged over the 100 simulated 
data sets) are compared to the 90% Monte-Carlo Confidence intervals (MCCI) for our 
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gamma estimates for Frechet(1 ) 



Quantile estimates for Frechet(1) 
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Figure 2: Means of 7 and q on the 100 Monte Carlo replications from Frechet(l) for 
different values of k. 

Bayes-QC estimator. For both Burr (left panel) and Frechet (right panel) ditributions 
lower bounds of the average Bayes credibility intervals and MCCI are very close. Upper 
bounds of average Bayes credibility intervals are larger than those of MCCI. It is interesting 
to note that the width of the average Bayes credibility intervals are the narrowest for k 
where Bayes estimate of gi_ p is the closest to the true value qi- p . This could be used 
to chose optimal values of the number of excesses k. Finally, Figure [1] suggests that the 
optimal value of k in the Burr(l, 0.5, 2) case is close to 90. For k = 90, the credibility 
intervals for both 7 and are still satisfactory. 



4 Application to real data sets 

Here, advantages and good performance of our Bayesian estimators are illustrated through 
the analysis of extreme events described by two benchmark real data sets. 

Nidd river data are widely used in extreme value studies (Hosking and Wallis, 1987 
and Davison and Smith, 1990). The raw data consist in 154 exceedances of the level 65 



15 
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Figure 3: Means of 7 and q on the 100 Monte Carlo replications from Log-Gamma(2) for 
different values of k. 

m 3 s _1 by the river Nidd (Yorkshire, England) during the period 1934-1969 (35 years). 
The iV-year return level is the water level which is exceeded on average once in N years. 
Hydrologists need to estimate extreme quantiles in order to predict return levels over long 
periods (250 years, i.e. p = 9 10" 4 , or even 500 years). 

Fire reinsurance data were first studied by Schnieper (1993) and Reiss and Thomas 
(1999). They represent insurance claims exceeding u = 22 millions of Norwegian Kroner 
from 1983 to 1992 (1985 prices are used). 

Many Goodness-of-fit tools (see for example Embrechts et al. 1997) suggested that ex- 
cesses from Nidd river data and Fire reinsurance data are well modeled by GPD probability 
distribution. It was also shown that there are no problems of stationarity violation. 
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coverage rates for gamma 



coverage rates for q_p 




Figure 4: Coverage rates of our 90 % Bayesian credibility intervals for 7 and gi_ p based 
on the 100 Monte Carlo replications for different values of k. 

4.1 Nidd river data 

Bayes quasi-conjugate estimates and related 90 % credibility intervals for 7 and q\- v are 
depicted in Figure [7] for several values of k. Compared to other estimators, our approach 
provides the most stable estimates as k varies. For 8 values of k Grimshaw's algorithm 
for computing ML estimates did not converge: see the broken ML curves in Figure [7J 
Histograms of 7's and gi_ p 's for k = 82 are displayed in Figure [SJ Table [TJ summarizes 
results of the estimation of the 50-year and 100-year return levels of the Nidd river when 
the threshold u is set equal to either 100 (k = 39) or 120 (k = 24). 

Remark 3 . - Recall that the iV-year return level RL^ is the water level which is 
exceeded on average once in N years. Equations relating to the return level RLn 
follow from Davison and Smith (1990): RLn — u — (a/j)[l — (XN)^], where it is assumed 
that the exceedance process is Poisson with annual rate A. If we have observed k excesses 
above the threshold u during 35 years, then A is estimated by /c/35. It follows that RLn 
= qi-p with p ~ k I XNn. For the Nidd river data, this yields p ~ 35 / Nn. Therefore, 
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Comparison of estimators precisions 
Quantiles of Burr(1, 0.5,2) 



Comparison of estimators precisions 
Quantiles of Frechet(1) 
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Figure 5: Width of 90% Monte-Carlo confidence intervals of qi^ p for different values of k. 

estimating the 100-year return level is equivalent to estimating by p = 35/ (100 x 154) 
~2.2710~ 3 . □ 
As shown in Table [U our credibility intervals with level 95 % (see the last column) compare 
well to the Bayesian confidence intervals obtained by Davison and Smith (1990), which are 
based on uniform priors for qi- p , A and 7 (see Smith and Naylor, 1987 for more details). 
Actually, ours are slightly narrower. 

4.2 Fire reinsurance data and net premium estimation 

In the excess-of-loss (XL) reinsurance agreements, the re-insurer pays only for excesses 
over a high value u of the individual claim sizes. The net premium is the expectation 
of the total claim amounts that the re-insurer will pay during the future period [0, T\: 
E(5jv t ) = Yli=x where N T is the random number of claims exceeding u during [0, T] 
and Yi, Y 2 , . . . are the amounts of excesses above u. If the Yj's are modeled by a GPD(7, 
a) and the exceedance arrival process is modeled by a homogeneous Poisson process with 
annual rate A, then the net premium over the coming year is approximated by E(5jv 1 ) — 
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Bayes Interval estimates Bayes Interval estimates 
Quantiles of Burr(1, 0.5,2) Quantiles of Frechet(1) 
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Figure 6: 90 % Bayes credibility and Monte-Carlo confidence intervals for qi_ p for different 
values of k. 




Figure 7: 7 and q\- p for the Nidd river data set with several values of k. 
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Nidd data : (Bayes estim.) 



Nidd data : (Bayes estim.) 
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Figure 8: Histograms of 500 7's and gi_ p 's simulated from the posterior distribution for 
the Nidd river data set, k = 82. 

E(iVi) E(Yj) = Act/ (1— 7). Reiss and Thomas (1999, 2001) estimate the net premium of the 
Norwegian fire claims reinsurance by analysing a data set (see Table |2]) of large Norwegian 
fire claims between 1983 and 1992 (1985 prices). They use a Bayesian inference method 
for the GPD parameters and assume that the exceedance process is Poisson. Independent 
gamma priors are used for A and a and an inverse-gamma prior, with parameters (a, b), is 
chosen for 0. Posterior means of A, a and /3 are computed using Monte Carlo numerical 
approximations of integrals. Table |3] compares estimates of (7, a) and the net premium 
Aer/(1 — 7) obtained by our quasi-conjugate approach with those obtained by Reiss and 
Thomas for ML and Bayesian estimates with different values of the hyperparameters a 
and b of the inverse-gamma. Our approach has the advantage of indicating the precision 
of the estimates. 
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Analysis ML Bayes-QC Uniform Bayes Bayes-QC 



point Return level estimate Credibility Intervals 



50-year return level 














u = 100 


305 


374 


[210, 


775] 


[266, 


672] 


u = 120 


280 


403 


[215, 


850] 


[304, 


690] 


100-year return level 














u = 100 


340 


457 


[220, 


935] 


[306, 


911] 


u = 120 


307 


499 


[225, 


940] 


[354, 


961] 



Table 1: Uniform and quasi-conjugate Bayesian 95 % credibility intervals for 50- year and 
100-year return levels. Nidd river data. 



year 


claim sizes 


year 


claim sizes 




(in millions) 




(in millions) 


1983 


42.719 




23.208 


1984 


105.860 


1990 


37.772 


1986 


29.172 




34.126 




22.654 




27.990 


1987 


61.992 


1992 


53.472 




35.000 




36.269 


1988 


26.891 




31.088 


1989 


25.590 




25.907 




24.130 







Table 2: Norwegian fire claims sizes over 22.0 millions NKr from 1983 to 1992 (1985 prices), 
from Schnieper (1993). 



Analysis 


7 


a 


Net 

Point estimate 


premium 

90 % credib. interv. 


ML for GPD 


0.254 


11.948 


27.23 




Bayes (Reiss and Thomas) 
Inv.-Gamma(a = 2, b = 2) 
Inv.-Gamma(a = 4, b = 6) 


0.288 
0.274 


11.658 
11.814 


27.83 
27.66 




Bayes-QC approach 


0.384 


10.332 


30.03 


[17.09, 84.39] 



Table 3: Bayesian estimates of 7, a and the XL net premium for fire reinsurance data. 
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5 Discussion 



The proposed quasi-conjugate Bayes approach has many advantages when compared to 
standard GPD parameters and extreme quantiles estimators: 

- it can incorporate experts prior information and improve estimation of extreme events 
even when data are scarce, 

- it provides Bayes credibility intervals assessing the quality of the extreme events estima- 
tion, 

- it often gives estimators with weak dependence on the number k of used excesses, 

- the variances of the empirical quasi-conjugate Bayes estimators compares well to the 
variances of the standard estimators. This suggests that quasi-conjugate Bayes estimators 
including experts opinion would give very accurate extreme quantile estimators, this point 
will be illustrated in a forthcoming paper. 

We deeply describe the proposed quasi-conjugate Bayes approach for the most frequent 
case of DA(Frechet) where tails are heavy (7 > 0), the case of DA(Gumbel) is analytically 
discussed in Appendix A. Future work is needed to extend this approach to the general 
case where the user has no prior idea on 7. 

The present paper is the first of a series of papers devoted to various developments 
of the Bayesian inference methodology that we introduced here. In particular, we will 
study how to determine and compute hyperparameters in a hierarchical structure setting 
based on the quasi-conjugate class defined here to take into account realistic expert prior 
information on extreme events. 

Finally, note that it would be possible to include a Poisson parameter for the stream 
of exceedances as in Reiss and Thomas (2000). Also, spatial quantile estimation and mul- 
tivariate or time-series extensions based on our approach are natural and very promising. 

Appendix A 

We present here a brief account of the simple case where Bayesian inference is made 
for exponential distributions, rather than GPD's with both parameters unknown. This 
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simple setup is of interest since it is the Bayesian analogue of the Exponential Tail (ET) 
method (Breiman et al. 1990), and all calculations lead to explicit analytical formulas. 

Set A = l/cr and f\(y) = \ e~ Xy l y>0 . Denote by TT a ,b the prior Gamma(a, b) density 
(a, b > 0). The posterior density 7r a , ! &(A| yi, ■ ■ ■ ,y k ) is Gamma(a + k, b + S k ), where Sk 
= Yli=iHj- Expert information is reflected in the choice of a and b. The corresponding 
posterior predictive distribution is GPD (a + k, b + S k ), with 7 prc d = l/(a + k) and o- pre d 
= (b + S k )/(a + k). Our first estimate of q±- p is based on the posterior mean Ab ayC s = 
(a + k)/(b + S k ) of A: 

b + S k . fk\ 
Sr-^es = u + _ln(-j. 

Our second estimate is based on the posterior predictive distribution: 

k \ !/(«+*) 



<Zi-p,pred = U + (b + S k 



np 



Our third estimate is based on the transformed posterior distribution. Since the posterior 
on a = 1/ A is an inverse-gamma distribution with density 



(b + S k ) a+k 



- exp ] 1 CT>0 



r (a + k) a a + k ^ 

the corresponding distribution of u + a In (k/np) has a similar shape, and has mean 



Ql—p, post 



u + b + s k ln (k_ 

a + k — 1 V 



For k large enough, baycs is close to qi- p , pos t with respect to the standard deviation 
scale, which is of the order of (6 + S k )(a + k)~ 3 / 2 ln (k/np). On the contrary, a Taylor 
expansion shows that when ln (k/np) /(a + A;) is not too large, 



0.1— p, pred 



b + S k . (k 
u + — In — 

a + k \np 



1 + 



ln ( ± 

np 



2(a + k) 

The distance between qi- p , pre d and each of the two other estimates can be significant, and 
Qi-p, pred exhibits a positive bias with respect to the other estimates. We have observed a 
similar behavior when dealing with GPD's: This is the reason why we have discarded the 
analogous of qi- p , prc d in that setting, and selected estimates of based on its posterior 
distribution. 



23 



Appendix B 



We propose here two examples for introducing expert opinion in our Bayesian frame- 
work. In the first one, we use a partial expert opinion: It acts only on one parameter of 
the GPD distribution, whereas the second one is derived from the empirical choice made 



in Subsection 12.41 In the second one, the expert opinion acts on both parameters of the 
GPD distribution. 



Example 1. In this situation, the expert provides a rare value g max of the variable as 
well as an interval [pi, p 2 ] containing the probability p to overpass g max and a (small) 
probability e measuring the uncertainty of this opinion. From Pickands theorem, we 
deduce q^ P2 < g max < q^ Pl where 

'np\ - l / a 



Qi- 



u + /3 



k 



(15) 



Plugging u = x n - k ^ n yields 



np 2 \- 1 / a 



•En—k,n ^ P 



npi \ 
~k~) 



(16) 



Replacing a by its Hill estimate a (similarly to Subsection I2.4j) . we obtain the following 
bounds for (3: 

n - n , n i n Q'max •^n—k,n , n Qmax •^n—k,n 

Pi < (3 < p 2 where p\ = n /A and p 2 ~ 



(np 1 /k)- 1 / & - 1 (np 2 /k)~ 1 / & - 1 ' 

Note that, the converse approach (fixing (3 and bounding a) can also be considered. The 
prior distribution of j3 given a = a is Gamma(<5& + 1, 5rf). Suppose that [0, (3i\ and 
\Pz, +00) both have probability e/2 for this gamma distribution. We thus have two non- 
linear equations permitting to obtain 5 and rj. Approaching the gamma distribution by a 
Gaussian one yields explicit solutions: 



6 



1 

a 



2 , (3i + (3 2 

Z l-e/2 



2az^ £/2 



(3 2 -f3i 

(3i + (3 2 
(/3 2 -/?i) 2 



'l-e/2 



$2 -Pi 



(17) 
(18) 
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where Z\_ £ /2 is the 1 — e/2 quantile of the standard Gaussian distribution. The parameter 
/i is determined by imposing that the gamconII(<5, rj/n) prior distribution of a has mode 
a. From ffTTl) . we obtain: 

/I = r)exp [In 5 + ip(a) -ip(da+l)], (19) 
where ip denotes the digamma function. 

Example 2. Here, the expert provides two rare values g m ax,i and (frnax^ of the variable 
as well as their associated probabilities pi and P2 to be overcome. The confidence on this 
opinion is measured by 5 (see Remark 1). Equation (1151) yields two nonlinear equations 
from which the GPD parameters (ao, /?o) can be computed. This allows to determinate 
the hyperparameters rj and /i. To this end, we impose a and (3 to be respectively the 
modes of the prior distribution of a and f3 given a. From (FTT]) . we obtain 

V = aa/Pa (20) 
/i = ?7exp(ln<5 + ip(a ) - ip(a 5 + 1)), (21) 

where ip denotes the digamma function. 
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